On Efficient Handling of Continuous Attributes in Large Data Bases
نویسنده
چکیده
Some data mining techniques, like discretization of continuous attributes or decision tree induction, are based on searching for an optimal partition of data with respect to some optimization criteria. We investigate the problem of searching for optimal binary partition of continuous attribute domain in case of large data sets stored in relational data bases (RDB). The critical for time complexity of algorithms solving this problem is the number of I/O database operations necessary to construct such partitions. In our approach the basic operators are defined by queries on the number of objects characterized by means of real value intervals of continuous attributes. We assume the answer time for such queries does not depend on the interval length. The straightforward approach to the optimal partition selection (with respect to a given measure) requires basic queries, where is the number of preassumed partition parts in the searching space. We show properties of the basic optimization measures making possible to reduce the size of searching space. Moreover, we prove that using only simple queries, one can construct a partition very close to optimal.
منابع مشابه
Application of Artificial Neural Networks and Support Vector Machines for carbonate pores size estimation from 3D seismic data
This paper proposes a method for the prediction of pore size values in hydrocarbon reservoirs using 3D seismic data. To this end, an actual carbonate oil field in the south-western part ofIranwas selected. Taking real geological conditions into account, different models of reservoir were constructed for a range of viable pore size values. Seismic surveying was performed next on these models. F...
متن کاملA New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملتجزیه و تحلیل مفهوم هندلینگ کاردرمانی در کودکان فلج مغزی: یک مطالعه هیبرید
Objective: This study aimed to analyze the concept of occupational therapy handling in the children with cerebral palsy from the perspective of occupational therapy instructors and clinicians in Iran. Materials & Methods: In this qualitative study, using hybrid model to clarify the concept of handling through three phases. For the theoretical phase, attributes of handling were recognized thr...
متن کاملIntelligenceLEARNING OF INEXACT RULES BY THE FISH - NETALGORITHM FROM LOW QUALITY DATAHONGHUA
We present an algorithm, the FISH-NET algorithm, for deriving classiica-tion/forecasting rules from large data bases of low quality data. The attributes are assumed to be continuous, numeric variables. The algorithm works on the eld of the attributes, rather than on individual point values and is linear in both the number of attributes and the number of instances. The algorithm has been tested ...
متن کاملA DEA-bases Approach for Multi-objective Design of Attribute Acceptance Sampling Plans
Acceptance sampling (AS), as one of the main fields of statistical quality control (SQC),involves a system of principles and methods to make decisions about accepting or rejecting alot or sample. For attributes, the design of a single AS plan generally requires determination ofsample size, and acceptance number. Numerous approaches have been developed foroptimally selection of design parameters...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Fundam. Inform.
دوره 48 شماره
صفحات -
تاریخ انتشار 2001